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We demonstrate that is it possible to simulate a system in thermal equilibrium even when the 
energy cannot be evaluated exactly, provided the error distribution is known. This leads to an 
effective optimisation strategy for problems where the evaluation of each design can only be sampled 
statistically. 
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We consider thermal equiHbrium simulation of systems 
in which the energy of any given state is either not known 
exactly, or else can much more cheaply be estimated. A 
classic example which occurs in Physics is where each en- 
ergy calculation itself involves sampling over a distribu- 
tion or numerical integration, or the estimation of param- 
eters of a numerical model. In this letter we show how, 
by Stochastic Annealing, thermal equilibrium distribu- 
tions can nevertheless be sampled exactly, the essence of 
our method being that the energy errors can be precisely 
absorbed as a contribution to thermal noise. 

Our thermal sampling technique can be applied to opti- 
misation problems where the objective function is analo- 
gously difhcult to evaluate, using Simulated Annealing 
(meaning simulated cooling) or related methods 0,1 
As an example we consider designing model protein 
molecules to fold as fast as possible, where the only 
way to evaluate a particular design is to run a sample 
of folding simulations. Confronted by similar problems 
others have developed more empirical methods ^ |^ |^ , 
but none of these is underpinned by simulation of true 
thermal equilibrium. 

In a thermal ensemble the probability of the system 
occupying a state with energy i?(/i) is 

P{^i) cx e"''^^'') (1) 

where /3 = ^ is the inverse temperature. It is conve- 
nient to sample this distribution by a Markov process, in 
which the system is allowed to make a transition (move) 
from one state to another with rate constant K{ji ^ v) 
which depends only on the two states concerned. This is 
generally more efficient than trying to choose the states 
directly, provided we can assume that the move set is 
ergodic, meaning that all states can be reached (even- 
tually) from any given starting state. The more strict 
condition of detailed balance, 

p{^Ji)K{^i^v)^p{u)K{v^ ^Ji), (2) 

imposes the correct equilibrium distribution provided 
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where AE is the energy difference E^i/) — E{fi). 

Whilst eq. Q ensures thermal equilibrium at inverse 
temperature /3, it does not fully determine the form of 
K. Typically K{fi — > v) is the combination of an at- 
tempt frequency to move to state v given that the system 
is in state /i, multiplied by an acceptance probability. 
For simplicity of exposition we will take all the attempt 
frequencies to be equal to unity, so that K is just an 
acceptance probability and so must obey < < 1. 
The Metropolis algorithm0 is fully specified by requir- 
ing that K{AE) = 1 for AE < C, with C as large as 
possible (maximising acceptance rates) leading to 



KMotropoUAE) - min (l, e-P^^) 



(4) 



whereas the Glauber acceptance function ^] arises by 
requiring that K{AE) + K{-AE) = 1, leading to 



ifGlaubcr(Ai;) = 1/ (l 



(5) 



These are compared graphically in Fig. ^ 

We now consider the case when the true energy change 
is not known exactly, and we must accept moves with 
probabilty A{x) where x is only an estimate of the en- 
ergy change. We will assume that these estimates have 
statistically independent errors. If f{x\AE) is the prob- 
ability density of estimating that the energy change is 
X when its true value is AE, then the net probability 
of accepting a move whose true energy change is AE is 
given by 



K{AE) = f{x\ AE) A{x) dx. 
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(3) 
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It is our aim to choose A{x) such that K{AE) satisfies 
detailed balance 

We can gain insight by considering the crude choice 
A{x) = 1 for x < and A{x) — otherwise. This simple 
strategy can give a good aproximation to eq. ||2J) of great 
value in optimisation. The resulting acceptance function 
K when / is given by a Gaussian distribution is simply 
related to the standard Error Function and is graphed in 
Fig. nj The resemblance between this and the Glauber 
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FIG. 1: A simple approximate stochastic annealing is ob- 
tained by accepting moves on the basis of the sign of the 
estimated energy change x. The acceptance probability as 
a function of the true underlying energy change AE is then 
given by a convolution with the error distribution. As shown 
for the Gaussian distribution case, this gives an excellent 
approximation to the exact Glauber acceptance rule. 



acceptance function (whose symmetry it shares) is strik- 
ing, showing how the random energy errors make the 
selection look thermal - although in this case the match 
is not exact, so detailed balance is not strictly achieved. 
The standard deviation a of the Gaussian distribution 
controls an approximate effective temperature using this 
rule, as inverting eq. © gives 



order (AS/rf . (7) 



-a (l- 0.018 {AE/Tf 



For many optimisation purposes the departure from de- 
tailed balance, due to the energy dependent terms at 
large \AE/T\, is not a problem. Fig. 12 shows how we 
successfully used this approach to optimise model protein 
folding: for each design change considered we obtained 
estimates of the change in mean folding time from a lim- 
ited sample of folding simulations, and as the annealing 
proceeded we gradually cooled the system by increasing 
the sample sizes leading to reduced error size and re- 
duced temperatures through Eq[71 In another paper 
we show that this approach can be exploited to unravel a 
benchmark problem in stochastic optimisation, the prob- 
abilistic travelling salesman problem 0, ^| . 

We now attempt directly to choose A{x) properly, such 
that K exactly satisfies detailed balance, eq.|Hl assuming 
that the error distribution f{x\AE) is fully known. We 
first note that this leads to some bounds on the behaviour 
of the right tail of A{x) and a left tail of f{x\AE). 

Given that K{AE) < 1 even for negative argu- 
ments, the detailed balance condition requires J_^fix\ 
AE) A{x) dx < e~^^^ and for large positive AE this 
severely restricts all contributions to the LHS. First con- 
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FIG. 2; Stochastic annealing applied to protein folding, in a 
simple cubic lattice model. The chains were 27 units long and 
the space of amino acid sequences was explored for the ability 
of the molecule to fold spontaneously to a fixed 3x3x3 cubic 
target conformation, (a) The upper panel shows the result 
of two stochastic annealing simulations in which successive 
mutations of a starting sequence were accepted if a limited 
sampling of the Mean First Passage Time (to the target) was 
improved. Increasing the depth of sampling as the anneal- 
ing proceeded (j/-axis) provided the analogue of lowering the 
temperature, (b) For comparison the lower panel shows the 
result of an extensive search over sequences guided by thermal 
stability (energy in the target conformation). Clearly indi- 
vidiual runs of Stochastic Annealing find folding speeds ap- 
proaching the fastest available, and much better than would 
be achieved by seeking minimum energy (the Shaknovitich 
scheme fl^). 



sider x ~ AE for which f{x \ AE) is not expected to be 
small. Then the exponential decay on this contribution 
must come from A{x) falling off at least as fast as e~^^ as 
x — > cxD, which becomes important in our later analysis. 
Second consider a; < 0, for which A{x) cannot become 
small or we would be heavily rejecting even moves which 
appear to be downwards in energy. Then the exponen- 
tial decay on this contribution must come from f{x\AE), 
X <0, falling off at least as fast as e~^^^ for AE oo. 
This latter sets a fundamental restriction on our stochas- 
tic annealing: the probability for large energy increases 
to be estimated as negative must fall off faster than a 
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thermal Boltzmann factor. 

Now we begin direct analysis of the detailed balance 
condition inroducing the substitutions f{x\AE) = 

e 2 a 



et(^-^-^)g(a;,A^) and A(a;) = 
and (O combined simplify down to 



(x) so that eqs. © 

a{x) (g{x, AE) - g{x, -AE)) dx = 0. (8) 
Without significant loss of generality we can try 

/oo 
h{AE')j{x,AE')dAE' (9) 
-OO 

where h and j are functions to be chosen. Substituting 
this into equation [SI we find 

/oo 
h{AE') [k{AE, AE') - k{-AE, AE')] dAE' = 
-OO 

(10) 

where 

/oo 
g{x,AE)j{x,AE')dx. (11) 
-OO 

Then eq. UUI is satisfied when h{AE') h{-AE') and 
k{AE, AE') = k{-AE, -AE') are even functions in the 
given sense. We have not succeeded in taking this most 
general case significantly further, the hard part being to 
implement < A{x) < 1 for a probability. 

For further progress we now specialise to the case of in- 
variant error distributions, where f{x\AE) ~ f(x—AE), 
meaning that the distribution of error is independent of 
AE itself. For this case, we can choose j{x,AE') = 
g{x - AE') and then from eq.[Tllwe find k{AE, AE') is 
a suitably even function. Then we have a solution to eq 
|H1 given by 

/oo 
h{AE')g{x- AE')dAE' (12) 
-C30 

provided /i(x), and correspondingly h{p) below, is even. 

We now aim to choose h to maximise the move accep- 
tance rates, which are governed by A(a;). We follow the 
Metropolis methodology in choosing A{x) (= e~^a{x)) 
to be identically 1 below some threshold, x < C . Then 
using the Fourier-Laplace transform defined by f{p) = 



-px 



f{x) dx, the transform of a becomes 



a{p)= I e-e-P'=dx+ b{x)e-P'' dx (13) 
) Jc 



where b{x) = A{x)e^ for x > C, b{x) = for x < C. 
From ea. (|12|l the transform of the new function b{x) is 
given by 



b{p) = h{p)g{p) g . 



(14) 



The exponential bound we established on the right tail 
of A{x) implies that b{p) is bounded for all Rep > 
whereas the integrability of f{x) together with the ex- 
ponential bound on its left tail only imply that g{p) is 
bounded for — < Rep < |. Thus h{p) must be cho- 
sen to cancel both any divergences of g{p) in Rep > ^ 
and the apparent pole at p = j3/2. We first turn to the 
Wiener-Hopf method 0| to define 



9{p) = 9l{p)9r{p) 



(15) 



where gLip) is bounded and non-zero for Rep < ^, and 
cjfiip) is bounded and non-zero for Rep > § ! it also 
follows from the bounded window for g{p) that gnip) 
is bounded for the wider range Rep > -|. Then 



by choosing h(p) = , , „ , — ~ / ^~ , — r , we can en- 
^ ' ((/3/2)^-p2) gL{p)gL(-p) ' 

sure that h{p)g{p) = (■^^^^^2_p2) ^^^^ bounded 

for Rep > — I except for the (desired) pole at p = ^. 

Choosing the constant B = (3 —tiiJ makes the residue 

of this pole cancel when we reassemble the expression 
(flUl for b(p) to give 



b{p) 



=(|-p)c 



/3 



9r{p)9l -| 



P ((/3/2)^-p2).gJ,(f)g2(_p)' 



(16) 

This should be the optimal solution. We cannot in- 
corporate any new factors into h{p) because, being even, 
they would have to be bounded both for Re{p) > — f 
and for Re{p) > ^, and we would run up against the 
limitations of Liouville's Theorem. The parameter C 
might appear to be a remaining degree of freedom, but 
it drops out of the acceptance function itself which is 
given directly in terms of 



a(p) = h(p)gip) 



9R 



{P)9L (-1) 



((/3/2)^~p2)g}j(f)gI(-p)' 



(17) 



The parameter C drops out because it simply reflects the 
partition we introduced in equation l|13|) . The feature 
which we have not strictly guaranteed is that A(x) = 1 
for X < C rather than some different cutoff. 

As an example of our approach above, consider the 
simple case where 



AE) = 2e-^l^-^-^l 



which leads to g{p',j) 



(7-#-p)(7 + |+p) 



(18) 



The choice 



of gL and gjj is trivial by inspection, giving a(p) = 



2+f3 2 — 2+P. ^ ^j^g inverse transformation can also 



be performed by inspection to give 
A{x) = min (l, ^'~/' e-^" 



^g-(7+/3):r^ 



(19) 
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We note that this acceptance function only remains pos- 
itive for X ^ when 7 > /3, which is precisely the 
limit of achievable stochastic annealing discussed earlier, 
and the case 7 — *■ 00 also duly recovers the straightfor- 
ward Metropolis method. We have analysed other simple 
cases such as a rectangular error distribution, the super- 
position of two exponentials as above, and the multi- 
ple convolution of exponentials, all leading to results of 
equivalent properties. 

The analysis of a Gaussian error distribution turns out 
to be slightly singular, in that its corresponding g{p) 
has no obvious Wiener-Hopf factorisation. However we 
can approach it by considering the case where x is taken 
to be a sum of N independent exponential-distributed 
variables, the error distribution taking the solvable form 
of multiply convolved exponentials and (by the Central 
Limit Theorem) approaching Gaussian form as ^ cxd. 
In this case g{p) — g{p]j)^ with 7 = ^/2N/a, where 
a is the standard deviation of x, leading to a(p) — 







N 







„/3(p-/3/2)f7V2 



as 



N 00 at fixed a. The corresponding acceptance func- 
tion by inverse transformation is then given by 



A{x) = min I 1, e 



-pix+f3a^/2) 



(20) 



The optimal acceptance rules found above all obey 
the bounds that < A{x) < 1 required for a proba- 
bility, but they only do so a postiori so it can be ob- 
jected that our optimal method gives no guarrantee of 
this outcome a priori. We have discovered a general but 
sub-optimal solution for the acceptance function which 
does assure the requirement. The key to this solution 
is to note that, given the result 1171 our requirements 
on the factorisation of g{p) can be relaxed to require 
only that gii{p) is bounded for Rep > and that 
9l{p) is non-zero for Rep < ^. Then assuming that 
J{x) — for a; < C, an acceptable factorisation is given 
by fffl(p) = g{p)e'P^ , gE{p) = e~P^ leading directly to 
a(j>) = h{p)g{p) = j^j^j^TZ^j^, at which point we 

can let C —00 so no significant new restriction has 
been imposed on /. The resulting acceptance function 
by inverse transformation can be expressed as the convo- 
lution 



Aix) 



1 



/(/?) 



i^Motiopoiis(a; -y) e '^^/(y) dy. (21) 



It can be verified by direct substitution that this docs 
obey the detailed balance condition |(2Jl, it is manifestly 
positive definite, and it gives maximum acceptance 1 as 
X — > — cxD. Simple comparison shows the resulting ac- 
ceptance probabilities are lower than the optimal values 
which we have calculated these, and the key difference 
is that an interval where A = 1 is no longer imposed. 

Given the wide impact of equilibrium statistical me- 
chanics, we are confident that our new method of exactly 



thermal stochastic annealing will find significant direct 
application. The chief limitation is that the error distri- 
bution must be known. In this context the Gaussian case 
is particulary important as it can be approached simply 
through multiple sampling, although its variance is also 
required. Estimating the variance from the same sample 
does induce errors, but these can be quantified and be- 
come negligible as the sample size becomes large It 
is a more open question whether it is viable to estimate 
by measurement the full distribution of error and apply 
this to eq. (PT|l . 

Underpinned by the exact results our analysis provides 
a powerful new tool in stochastic optimisation. Gener- 
ally it will be sufficient and convenient to use the approx- 
imately thermal method we presented, of simply accept- 
ing moves which appear to be an improvement. Then all 
the benefits of simulated annealing are obtained by delib- 
erately using crude estimates for each decision! In the 
protein folding problem we used limited samples of fold- 
ing time for these estimates, making the high tempera- 
ture part of the annealing computationally the cheapest. 
This interestingly complicates the long noted problem of 
how to choose the optimum cooling schedule, because 
cooling by reducing sample errors introduces computa- 
tional costs growing as l/T^ per move. 
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